- $450 and 19 hours is all it takes to rival OpenAI's o1-preview
- 4MLinux is a lightweight, portable Linux distro with an old-school feel
- Russian Malware Campaign Hits Central Asian Diplomatic Files
- This mini SSD enclosure transformed my data management - and I never leave home without it
- My favorite GPS tracker has unlimited battery life and surprisingly accurate tracking
AWS upgrades its 10p10u network to handle massive AI clusters
The demands on AI networks are particularly intense. DeSantis noted that during training, every server needs to talk to every other server at exactly the same time. The 10p10u network fabric is being specifically deployed in support of AWS’ UltraServer compute technology, which is being built out to run massive AI training workloads. Each Trainium2 UltraServer has almost 13TB of network bandwidth, requiring a massive network fabric to prevent bottlenecks.
“The 10p10u network is massively parallel, densely interconnected, and [the] 10p10u network is elastic,” DeSantis explained. “We can scale it down to just a few racks, or we can scale it up to clusters that span several physical data center campuses.”
How the 10p10u network increases optical networking density
Patch panels are a common sight in many data center networks, with a stream of cables connecting into a panel. With the complexity of the 10p10u network, AWS found that its existing patch panel approach wasn’t going to be enough. So it created something new. AWS developed a proprietary trunk connector that combines 16 separate fiber optic cables into a single connector.
“What makes this game changing is that all that complex assembly work happens at the factory, not on the data center floor, and this dramatically streamlines the installation process and virtually eliminates the risk of connection errors,” DeSantis said. “Now, while this might sound modest, its impact was significant. Using trunk connectors speeds up our install time on AI racks by 54%, not to mention making things look way neater.”
AWS also developed the Firefly optical plug, which further helps to improve the 10p10u network. The Firefly optical plug acts as a miniature signal reflector that allows AWS to test and verify network connections before the rack arrives on the data center floor. “That means we don’t waste any time [debugging cabling] when our servers arrive. And that matters, because in the world of AI clusters, time is literally money,” DeSantis said.
The Firefly optical plugs also act as a protective seal, which prevents dust particles from entering the optical connections. “This might sound minor, but even tiny dust particles can significantly degrade the integrity and create network performance problems,” DeSantis said.